Goto

Collaborating Authors

 perceptual difference



On The Classification-Distortion-Perception Tradeoff

Neural Information Processing Systems

Signal degradation is ubiquitous, and computational restoration of degraded signal has been investigated for many years. Recently, it is reported that the capability of signal restoration is fundamentally limited by the so-called perception-distortion tradeoff, i.e. the distortion and the perceptual difference between the restored signal and the ideal original signal cannot be made both minimal simultaneously. Distortion corresponds to signal fidelity and perceptual difference corresponds to perceptual naturalness, both of which are important metrics in practice. Besides, there is another dimension worthy of consideration--the semantic quality of the restored signal, i.e. the utility of the signal for recognition purpose.



On The Classification-Distortion-Perception Tradeoff

Neural Information Processing Systems

Signal degradation is ubiquitous, and computational restoration of degraded signal has been investigated for many years. Recently, it is reported that the capability of signal restoration is fundamentally limited by the so-called perception-distortion tradeoff, i.e. the distortion and the perceptual difference between the restored signal and the ideal "original" signal cannot be made both minimal simultaneously. Distortion corresponds to signal fidelity and perceptual difference corresponds to perceptual naturalness, both of which are important metrics in practice. Besides, there is another dimension worthy of consideration--the semantic quality of the restored signal, i.e. the utility of the signal for recognition purpose. In particular, we consider the classification error rate achieved on the restored signal using a predefined classifier as a representative metric for semantic quality.


Do humans and machines have the same eyes? Human-machine perceptual differences on image classification

arXiv.org Artificial Intelligence

One motivation of neural networks (NN) is creating artificial intelligence that can learn from human intelligence and mimic human behavior. In computer vision, researchers often build their work upon the assumption that neural networks learn a feature representation similar to visual cortex activity [3, 53, 37]. It is believed that a well-trained network learns to represent input stimuli in a way that is similar to human visual perception [9]. As a result, most current work in computer vision that aims to develop better computer models focuses on benchmark scores (e.g., prediction accuracy) and ignores the evaluations of human-machine similarity. Figure 1: Upper: Our study aims to understand the perceptual difference between human and machine classifiers. Lower: We empirically demonstrate the benefit of utilizing their perceptual difference with a post-hoc collaboration.


On The Classification-Distortion-Perception Tradeoff

Neural Information Processing Systems

Signal degradation is ubiquitous, and computational restoration of degraded signal has been investigated for many years. Recently, it is reported that the capability of signal restoration is fundamentally limited by the so-called perception-distortion tradeoff, i.e. the distortion and the perceptual difference between the restored signal and the ideal "original" signal cannot be made both minimal simultaneously. Distortion corresponds to signal fidelity and perceptual difference corresponds to perceptual naturalness, both of which are important metrics in practice. Besides, there is another dimension worthy of consideration--the semantic quality of the restored signal, i.e. the utility of the signal for recognition purpose. In particular, we consider the classification error rate achieved on the restored signal using a predefined classifier as a representative metric for semantic quality.


Maximal Jacobian-based Saliency Map Attack

arXiv.org Machine Learning

The Jacobian-based Saliency Map Attack is a family of adversarial attack methods for fooling classification models, such as deep neural networks for image classification tasks. By saturating a few pixels in a given image to their maximum or minimum values, JSMA can cause the model to misclassify the resulting adversarial image as a specified erroneous target class. We propose two variants of JSMA, one which removes the requirement to specify a target class, and another that additionally does not need to specify whether to only increase or decrease pixel intensities. Our experiments highlight the competitive speeds and qualities of these variants when applied to datasets of hand-written digits and natural scenes.


Collaborative Language Grounding Toward Situated Human-Robot Dialogue

AI Magazine

To enable situated human-robot dialogue, techniques to support grounded language communication are essential. One particular challenge is to ground human language to robot internal representation of the physical world. Although copresent in a shared environment, humans and robots have mismatched capabilities in reasoning, perception, and action. Their representations of the shared environment and joint tasks are significantly misaligned. Humans and robots will need to make extra effort to bridge the gap and strive for a common ground of the shared world. Only then, is the robot able to engage in language communication and joint tasks. Thus computational models for language grounding will need to take collaboration into consideration. A robot not only needs to incorporate collaborative effort from human partners to better connect human language to its own representation, but also needs to make extra collaborative effort to communicate its representation in language that humans can understand. To address these issues, the Language and Interaction Research group (LAIR) at Michigan State University has investigated multiple aspects of collaborative language grounding. This article gives a brief introduction to this research effort and discusses several collaborative approaches to grounding language to perception and action.


Learning to Mediate Perceptual Differences in Situated Human-Robot Dialogue

AAAI Conferences

In human-robot dialogue, although a robot and its human partner are co-present in a shared environment, they have significantly mismatched perceptual capabilities (e.g., recognizing objects in the surroundings). When a shared perceptual basis is missing, it becomes difficult for the robot to identify referents in the physical world that are referred to by the human (i.e., a problem of referential grounding). To overcome this problem, we have developed an optimization based approach that allows the robot to detect and adapt to perceptual differences. Through online interaction with the human, the robot can learn a set of weights indicating how reliably/unreliably each dimension (e.g., object type, object color, etc.) of its perception of the environment maps to the human's linguistic descriptors and thus adjust its word models accordingly. Our empirical evaluation has shown that this weight-learning approach can successfully adjust the weights to reflect the robot's perceptual limitations. The learned weights, together with updated word models, can lead to a significant improvement for referential grounding in future dialogues.